Analysis of High Dimensional Compositional Data Containing Structural Zeros with Applications to Microbiome Data

نویسندگان

  • Abhishek Kaul
  • Ori Davidov
  • Shyamal D. Peddada
چکیده

This paper is motivated by the recent interest in the analysis of high dimensional microbiome data. A key feature of this data is the presence of ‘structural zeros’ which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are insufficient to model these structural zeros. We define a general framework which allows for structural zeros in the model and propose methods of estimating sparse high dimensional covariance and precision matrices under this setup. We establish error bounds in the spectral and frobenius norms for the proposed estimators and empirically support them with a simulation study. We also apply the proposed methodology to the global human gut microbiome data of Yatsunenko (2012).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Covariance-Based Outlier Detection for Compositional Data with Structural Zeros: Application to Italian Survey of Household Income and Wealth Data

Outlier detection is an important task for the statistical analysis of multivariate data, because often the outliers contain important information about the data structure. In compositional data, represented usually as proportions subject to a unit sum constraint, the ratios between the parts (variables) contain the essential information. This inherent property is, however, incompatible with th...

متن کامل

Genome analysis Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Motivation: The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting u...

متن کامل

Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Motivation The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting un...

متن کامل

Compositional Mediation Analysis for Microbiome Studies

Motivated by recent advances in causal inference on mediation analysis and problems in the analysis of metagenomic data, we consider the effect of a treatment on an outcome transmitted through microbes, or compositional mediators. Compositional and high dimensional natures of such mediators make the standard mediation analysis not directly applicable. In this paper, we propose a method for esti...

متن کامل

Simulation of Smoke Emission from Fires in High-Rise Buildings Using the 3D Model Generated from 2-Dimensional Cadastral Data

Having a 3-Dimensional model of high-rise buildings can be used in disaster management such as fire cases to reduce casualties. The fundamental dilemma in 3D building modeling is the unavailability of suitable data sources. However, available cadastral 2D maps could be used as low-cost and attainable resources for 3D building modeling. Smoke will be a great threat to people's health during a f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016